HITS-based Seed Selection and Stop List Construction for Bootstrapping

نویسندگان

  • Tetsuo Kiso
  • Masashi Shimbo
  • Mamoru Komachi
  • Yuji Matsumoto
چکیده

In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction† ELLEN RILOFF and JESS ICA SHEPHERD

Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specific semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of ‘seed words’ that belong to a semantic class of interest. The algori...

متن کامل

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction

Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-speciic semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of \seed words" that belong to a semantic class of interest. The algorit...

متن کامل

Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms

Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of bootstrapping has the same root as the topic drift of Kleinberg’s HITS, using a simplified graphbased reformulation of bootstrapping. We confirm that two graph-based algorithms, the von Neumann kernels and the regularized Laplac...

متن کامل

Automatic Construction of Chinese Stop Word List

In modern information retrieval systems, effective indexing can be achieved by removal of stop words. Till now many stop word lists have been developed for English language. However, no standard stop word list has been constructed for Chinese language yet. With the fast development of information retrieval in Chinese language, exploring Chinese stop word lists becomes critical. In this paper, t...

متن کامل

Understanding seed selection in bootstrapping

Bootstrapping has recently become the focus of much attention in natural language processing to reduce labeling cost. In bootstrapping, unlabeled instances can be harvested from the initial labeled “seed” set. The selected seed set affects accuracy, but how to select a good seed set is not yet clear. Thus, an “iterative seeding” framework is proposed for bootstrapping to reduce its labeling cos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011